Overview

Dataset statistics

Number of variables28
Number of observations431.290
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory83.9 MiB
Average record size in memory204.0 B

Variable types

Categorical19
Numeric9

Alerts

year has constant value "2015" Constant
dishwasher has constant value "NA" Constant
country has a high cardinality: 60 distinct values High cardinality
wealth has a high cardinality: 47482 distinct values High cardinality
escs has a high cardinality: 52733 distinct values High cardinality
country_2 has a high cardinality: 60 distinct values High cardinality
country_name has a high cardinality: 60 distinct values High cardinality
country_3 has a high cardinality: 60 distinct values High cardinality
school_id is highly correlated with country and 7 other fieldsHigh correlation
student_id is highly correlated with country and 7 other fieldsHigh correlation
math is highly correlated with read and 1 other fieldsHigh correlation
read is highly correlated with math and 1 other fieldsHigh correlation
science is highly correlated with math and 1 other fieldsHigh correlation
stu_wgt is highly correlated with country and 4 other fieldsHigh correlation
rank is highly correlated with country and 9 other fieldsHigh correlation
finalIq is highly correlated with country and 9 other fieldsHigh correlation
pop2021 is highly correlated with country and 8 other fieldsHigh correlation
year is highly correlated with room and 15 other fieldsHigh correlation
country is highly correlated with school_id and 18 other fieldsHigh correlation
mother_educ is highly correlated with country and 8 other fieldsHigh correlation
father_educ is highly correlated with country and 8 other fieldsHigh correlation
gender is highly correlated with dishwasher and 1 other fieldsHigh correlation
computer is highly correlated with country and 12 other fieldsHigh correlation
internet is highly correlated with country and 12 other fieldsHigh correlation
desk is highly correlated with country and 12 other fieldsHigh correlation
room is highly correlated with country and 11 other fieldsHigh correlation
dishwasher is highly correlated with room and 15 other fieldsHigh correlation
television is highly correlated with country and 10 other fieldsHigh correlation
computer_n is highly correlated with country and 12 other fieldsHigh correlation
car is highly correlated with country and 10 other fieldsHigh correlation
book is highly correlated with country and 9 other fieldsHigh correlation
country_2 is highly correlated with country and 18 other fieldsHigh correlation
country_name is highly correlated with country and 18 other fieldsHigh correlation
country_3 is highly correlated with country and 18 other fieldsHigh correlation
student_id has unique values Unique

Reproduction

Analysis started2022-10-28 18:15:04.814920
Analysis finished2022-10-28 18:20:35.540290
Duration5 minutes and 30.73 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

year
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
2015
431290 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters1.725.160
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015
2nd row2015
3rd row2015
4th row2015
5th row2015

Common Values

ValueCountFrequency (%)
2015431290
100.0%

Length

2022-10-28T20:20:35.650431image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:35.842073image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
2015431290
100.0%

Most occurring characters

ValueCountFrequency (%)
2431290
25.0%
0431290
25.0%
1431290
25.0%
5431290
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1725160
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2431290
25.0%
0431290
25.0%
1431290
25.0%
5431290
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common1725160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2431290
25.0%
0431290
25.0%
1431290
25.0%
5431290
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1725160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2431290
25.0%
0431290
25.0%
1431290
25.0%
5431290
25.0%

country
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct60
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
BRA
 
23141
CAN
 
20058
AUS
 
14530
ARE
 
14167
GBR
 
14157
Other values (55)
345237 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1.293.870
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowISR
2nd rowISR
3rd rowISR
4th rowISR
5th rowISR

Common Values

ValueCountFrequency (%)
BRA23141
 
5.4%
CAN20058
 
4.7%
AUS14530
 
3.4%
ARE14167
 
3.3%
GBR14157
 
3.3%
QAT12083
 
2.8%
COL11795
 
2.7%
ITA11583
 
2.7%
BEL9651
 
2.2%
THA8249
 
1.9%
Other values (50)291876
67.7%

Length

2022-10-28T20:20:35.912851image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bra23141
 
5.4%
can20058
 
4.7%
aus14530
 
3.4%
are14167
 
3.3%
gbr14157
 
3.3%
qat12083
 
2.8%
col11795
 
2.7%
ita11583
 
2.7%
bel9651
 
2.2%
tha8249
 
1.9%
Other values (50)291876
67.7%

Most occurring characters

ValueCountFrequency (%)
R149516
 
11.6%
A143566
 
11.1%
N95098
 
7.3%
L82082
 
6.3%
E79483
 
6.1%
U79479
 
6.1%
T73263
 
5.7%
S72899
 
5.6%
B62638
 
4.8%
C57164
 
4.4%
Other values (16)398682
30.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1293870
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R149516
 
11.6%
A143566
 
11.1%
N95098
 
7.3%
L82082
 
6.3%
E79483
 
6.1%
U79479
 
6.1%
T73263
 
5.7%
S72899
 
5.6%
B62638
 
4.8%
C57164
 
4.4%
Other values (16)398682
30.8%

Most occurring scripts

ValueCountFrequency (%)
Latin1293870
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
R149516
 
11.6%
A143566
 
11.1%
N95098
 
7.3%
L82082
 
6.3%
E79483
 
6.1%
U79479
 
6.1%
T73263
 
5.7%
S72899
 
5.6%
B62638
 
4.8%
C57164
 
4.4%
Other values (16)398682
30.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII1293870
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R149516
 
11.6%
A143566
 
11.1%
N95098
 
7.3%
L82082
 
6.3%
E79483
 
6.1%
U79479
 
6.1%
T73263
 
5.7%
S72899
 
5.6%
B62638
 
4.8%
C57164
 
4.4%
Other values (16)398682
30.8%

school_id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct15386
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41570919.19
Minimum800001
Maximum85800222
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2022-10-28T20:20:36.029130image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum800001
5-th percentile3600578
Q117000312
median40000005
Q364300023
95-th percentile82600178
Maximum85800222
Range85000221
Interquartile range (IQR)47299711

Descriptive statistics

Standard deviation26260221.53
Coefficient of variation (CV)0.631696918
Kurtosis-1.316732116
Mean41570919.19
Median Absolute Deviation (MAD)23400100
Skewness0.08255531682
Sum1.792912174 × 1013
Variance6.895992347 × 1014
MonotonicityNot monotonic
2022-10-28T20:20:36.142744image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
63400102635
 
0.1%
44200001293
 
0.1%
44200016258
 
0.1%
44200024256
 
0.1%
63400051246
 
0.1%
63400012244
 
0.1%
44200038231
 
0.1%
63400152226
 
0.1%
63400076226
 
0.1%
49900036226
 
0.1%
Other values (15376)428449
99.3%
ValueCountFrequency (%)
80000132
< 0.1%
8000022
 
< 0.1%
8000036
 
< 0.1%
80000429
< 0.1%
80000533
< 0.1%
8000064
 
< 0.1%
80000725
< 0.1%
80000811
 
< 0.1%
8000099
 
< 0.1%
80001033
< 0.1%
ValueCountFrequency (%)
8580022237
< 0.1%
8580022132
< 0.1%
8580022024
< 0.1%
8580021918
< 0.1%
8580021818
< 0.1%
8580021734
< 0.1%
8580021636
< 0.1%
8580021536
< 0.1%
8580021433
< 0.1%
8580021321
< 0.1%

student_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct431290
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41576510.58
Minimum800001
Maximum85807641
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2022-10-28T20:20:36.430742image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum800001
5-th percentile3615411.9
Q117011121.25
median40000143.5
Q364300611.75
95-th percentile82605243.55
Maximum85807641
Range85007640
Interquartile range (IQR)47289490.5

Descriptive statistics

Standard deviation26259080.54
Coefficient of variation (CV)0.6315845215
Kurtosis-1.316711293
Mean41576510.58
Median Absolute Deviation (MAD)23408606
Skewness0.08267475679
Sum1.793153325 × 1013
Variance6.895393109 × 1014
MonotonicityNot monotonic
2022-10-28T20:20:36.545371image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
376030481
 
< 0.1%
56030171
 
< 0.1%
56038931
 
< 0.1%
56034411
 
< 0.1%
56006991
 
< 0.1%
56001001
 
< 0.1%
56058901
 
< 0.1%
56015921
 
< 0.1%
56046751
 
< 0.1%
56063461
 
< 0.1%
Other values (431280)431280
> 99.9%
ValueCountFrequency (%)
8000011
< 0.1%
8000021
< 0.1%
8000031
< 0.1%
8000041
< 0.1%
8000051
< 0.1%
8000061
< 0.1%
8000071
< 0.1%
8000081
< 0.1%
8000091
< 0.1%
8000101
< 0.1%
ValueCountFrequency (%)
858076411
< 0.1%
858076401
< 0.1%
858076391
< 0.1%
858076381
< 0.1%
858076371
< 0.1%
858076351
< 0.1%
858076341
< 0.1%
858076331
< 0.1%
858076321
< 0.1%
858076311
< 0.1%

mother_educ
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
ISCED 3A
217087 
ISCED 3B, C
76868 
ISCED 2
70227 
ISCED 1
27845 
NA
24708 

Length

Max length16
Median length8
Mean length8.233541237
Min length2

Characters and Unicode

Total characters3.551.044
Distinct characters20
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowISCED 3A
2nd rowISCED 3A
3rd rowNA
4th rowISCED 3A
5th rowISCED 3B, C

Common Values

ValueCountFrequency (%)
ISCED 3A217087
50.3%
ISCED 3B, C76868
 
17.8%
ISCED 270227
 
16.3%
ISCED 127845
 
6.5%
NA24708
 
5.7%
less than ISCED114555
 
3.4%

Length

2022-10-28T20:20:36.645670image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:36.752295image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
isced392027
42.2%
3a217087
23.4%
3b76868
 
8.3%
c76868
 
8.3%
270227
 
7.6%
127845
 
3.0%
na24708
 
2.7%
less14555
 
1.6%
than14555
 
1.6%
isced114555
 
1.6%

Most occurring characters

ValueCountFrequency (%)
498005
14.0%
C483450
13.6%
I406582
11.4%
S406582
11.4%
E406582
11.4%
D406582
11.4%
3293955
8.3%
A241795
6.8%
B76868
 
2.2%
,76868
 
2.2%
Other values (10)253775
7.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2453149
69.1%
Space Separator498005
 
14.0%
Decimal Number406582
 
11.4%
Lowercase Letter116440
 
3.3%
Other Punctuation76868
 
2.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C483450
19.7%
I406582
16.6%
S406582
16.6%
E406582
16.6%
D406582
16.6%
A241795
9.9%
B76868
 
3.1%
N24708
 
1.0%
Lowercase Letter
ValueCountFrequency (%)
s29110
25.0%
l14555
12.5%
e14555
12.5%
t14555
12.5%
h14555
12.5%
a14555
12.5%
n14555
12.5%
Decimal Number
ValueCountFrequency (%)
3293955
72.3%
270227
 
17.3%
142400
 
10.4%
Space Separator
ValueCountFrequency (%)
498005
100.0%
Other Punctuation
ValueCountFrequency (%)
,76868
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2569589
72.4%
Common981455
 
27.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
C483450
18.8%
I406582
15.8%
S406582
15.8%
E406582
15.8%
D406582
15.8%
A241795
9.4%
B76868
 
3.0%
s29110
 
1.1%
N24708
 
1.0%
l14555
 
0.6%
Other values (5)72775
 
2.8%
Common
ValueCountFrequency (%)
498005
50.7%
3293955
30.0%
,76868
 
7.8%
270227
 
7.2%
142400
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3551044
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
498005
14.0%
C483450
13.6%
I406582
11.4%
S406582
11.4%
E406582
11.4%
D406582
11.4%
3293955
8.3%
A241795
6.8%
B76868
 
2.2%
,76868
 
2.2%
Other values (10)253775
7.1%

father_educ
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
ISCED 3A
197530 
ISCED 3B, C
87009 
ISCED 2
71479 
NA
32041 
ISCED 1
28870 

Length

Max length16
Median length11
Mean length8.193187878
Min length2

Characters and Unicode

Total characters3.533.640
Distinct characters20
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowISCED 3A
2nd rowNA
3rd rowNA
4th rowISCED 3B, C
5th rowNA

Common Values

ValueCountFrequency (%)
ISCED 3A197530
45.8%
ISCED 3B, C87009
20.2%
ISCED 271479
 
16.6%
NA32041
 
7.4%
ISCED 128870
 
6.7%
less than ISCED114361
 
3.3%

Length

2022-10-28T20:20:36.863587image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:37.041965image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
isced384888
41.3%
3a197530
21.2%
3b87009
 
9.3%
c87009
 
9.3%
271479
 
7.7%
na32041
 
3.4%
128870
 
3.1%
less14361
 
1.5%
than14361
 
1.5%
isced114361
 
1.5%

Most occurring characters

ValueCountFrequency (%)
500619
14.2%
C486258
13.8%
I399249
11.3%
S399249
11.3%
E399249
11.3%
D399249
11.3%
3284539
8.1%
A229571
6.5%
B87009
 
2.5%
,87009
 
2.5%
Other values (10)261639
7.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2431875
68.8%
Space Separator500619
 
14.2%
Decimal Number399249
 
11.3%
Lowercase Letter114888
 
3.3%
Other Punctuation87009
 
2.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C486258
20.0%
I399249
16.4%
S399249
16.4%
E399249
16.4%
D399249
16.4%
A229571
9.4%
B87009
 
3.6%
N32041
 
1.3%
Lowercase Letter
ValueCountFrequency (%)
s28722
25.0%
l14361
12.5%
e14361
12.5%
t14361
12.5%
h14361
12.5%
a14361
12.5%
n14361
12.5%
Decimal Number
ValueCountFrequency (%)
3284539
71.3%
271479
 
17.9%
143231
 
10.8%
Space Separator
ValueCountFrequency (%)
500619
100.0%
Other Punctuation
ValueCountFrequency (%)
,87009
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2546763
72.1%
Common986877
 
27.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
C486258
19.1%
I399249
15.7%
S399249
15.7%
E399249
15.7%
D399249
15.7%
A229571
9.0%
B87009
 
3.4%
N32041
 
1.3%
s28722
 
1.1%
l14361
 
0.6%
Other values (5)71805
 
2.8%
Common
ValueCountFrequency (%)
500619
50.7%
3284539
28.8%
,87009
 
8.8%
271479
 
7.2%
143231
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII3533640
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
500619
14.2%
C486258
13.8%
I399249
11.3%
S399249
11.3%
E399249
11.3%
D399249
11.3%
3284539
8.1%
A229571
6.5%
B87009
 
2.5%
,87009
 
2.5%
Other values (10)261639
7.4%

gender
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
female
216878 
male
214412 

Length

Max length6
Median length6
Mean length5.005717731
Min length4

Characters and Unicode

Total characters2.158.916
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfemale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowfemale

Common Values

ValueCountFrequency (%)
female216878
50.3%
male214412
49.7%

Length

2022-10-28T20:20:37.202207image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:37.342442image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
female216878
50.3%
male214412
49.7%

Most occurring characters

ValueCountFrequency (%)
e648168
30.0%
m431290
20.0%
a431290
20.0%
l431290
20.0%
f216878
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2158916
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e648168
30.0%
m431290
20.0%
a431290
20.0%
l431290
20.0%
f216878
 
10.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2158916
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e648168
30.0%
m431290
20.0%
a431290
20.0%
l431290
20.0%
f216878
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2158916
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e648168
30.0%
m431290
20.0%
a431290
20.0%
l431290
20.0%
f216878
 
10.0%

computer
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
yes
349317 
no
62705 
NA
 
19268

Length

Max length3
Median length3
Mean length2.80993531
Min length2

Characters and Unicode

Total characters1.211.897
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNA
2nd rowNA
3rd rowNA
4th rowNA
5th rowNA

Common Values

ValueCountFrequency (%)
yes349317
81.0%
no62705
 
14.5%
NA19268
 
4.5%

Length

2022-10-28T20:20:37.471175image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:37.599401image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
yes349317
81.0%
no62705
 
14.5%
na19268
 
4.5%

Most occurring characters

ValueCountFrequency (%)
y349317
28.8%
e349317
28.8%
s349317
28.8%
n62705
 
5.2%
o62705
 
5.2%
N19268
 
1.6%
A19268
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1173361
96.8%
Uppercase Letter38536
 
3.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
y349317
29.8%
e349317
29.8%
s349317
29.8%
n62705
 
5.3%
o62705
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
N19268
50.0%
A19268
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1211897
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
y349317
28.8%
e349317
28.8%
s349317
28.8%
n62705
 
5.2%
o62705
 
5.2%
N19268
 
1.6%
A19268
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1211897
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
y349317
28.8%
e349317
28.8%
s349317
28.8%
n62705
 
5.2%
o62705
 
5.2%
N19268
 
1.6%
A19268
 
1.6%

internet
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
yes
365407 
no
46556 
NA
 
19327

Length

Max length3
Median length3
Mean length2.847241995
Min length2

Characters and Unicode

Total characters1.227.987
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNA
2nd rowNA
3rd rowNA
4th rowNA
5th rowNA

Common Values

ValueCountFrequency (%)
yes365407
84.7%
no46556
 
10.8%
NA19327
 
4.5%

Length

2022-10-28T20:20:37.722936image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:37.830236image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
yes365407
84.7%
no46556
 
10.8%
na19327
 
4.5%

Most occurring characters

ValueCountFrequency (%)
y365407
29.8%
e365407
29.8%
s365407
29.8%
n46556
 
3.8%
o46556
 
3.8%
N19327
 
1.6%
A19327
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1189333
96.9%
Uppercase Letter38654
 
3.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
y365407
30.7%
e365407
30.7%
s365407
30.7%
n46556
 
3.9%
o46556
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
N19327
50.0%
A19327
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1227987
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
y365407
29.8%
e365407
29.8%
s365407
29.8%
n46556
 
3.8%
o46556
 
3.8%
N19327
 
1.6%
A19327
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1227987
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
y365407
29.8%
e365407
29.8%
s365407
29.8%
n46556
 
3.8%
o46556
 
3.8%
N19327
 
1.6%
A19327
 
1.6%

math
Real number (ℝ≥0)

HIGH CORRELATION

Distinct224921
Distinct (%)52.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean458.052289
Minimum0
Maximum860.903
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size3.3 MiB
2022-10-28T20:20:37.956938image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile291.717
Q1385.14425
median457.938
Q3530.844
95-th percentile624.57065
Maximum860.903
Range860.903
Interquartile range (IQR)145.69975

Descriptive statistics

Standard deviation102.006902
Coefficient of variation (CV)0.2226970686
Kurtosis-0.3197276195
Mean458.052289
Median Absolute Deviation (MAD)72.85
Skewness-0.00141234106
Sum197553371.7
Variance10405.40806
MonotonicityNot monotonic
2022-10-28T20:20:38.106773image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
482.81711
 
< 0.1%
361.95410
 
< 0.1%
392.04210
 
< 0.1%
368.96710
 
< 0.1%
437.91110
 
< 0.1%
384.64710
 
< 0.1%
468.49310
 
< 0.1%
441.80410
 
< 0.1%
398.7129
 
< 0.1%
503.4369
 
< 0.1%
Other values (224911)431191
> 99.9%
ValueCountFrequency (%)
02
< 0.1%
8.4731
< 0.1%
28.4351
< 0.1%
29.4281
< 0.1%
31.6041
< 0.1%
39.2831
< 0.1%
41.6071
< 0.1%
52.3771
< 0.1%
52.6271
< 0.1%
54.4731
< 0.1%
ValueCountFrequency (%)
860.9031
< 0.1%
852.5261
< 0.1%
847.231
< 0.1%
846.551
< 0.1%
842.6151
< 0.1%
842.4771
< 0.1%
841.9221
< 0.1%
834.3211
< 0.1%
830.8811
< 0.1%
829.3831
< 0.1%

read
Real number (ℝ≥0)

HIGH CORRELATION

Distinct232996
Distinct (%)54.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean462.8403368
Minimum0
Maximum861.854
Zeros4
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size3.3 MiB
2022-10-28T20:20:38.255807image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile284.94125
Q1389.542
median466.2455
Q3539.10575
95-th percentile630.22455
Maximum861.854
Range861.854
Interquartile range (IQR)149.56375

Descriptive statistics

Standard deviation105.7651463
Coefficient of variation (CV)0.2285132429
Kurtosis-0.2557301969
Mean462.8403368
Median Absolute Deviation (MAD)74.6505
Skewness-0.1479724736
Sum199618408.8
Variance11186.26617
MonotonicityNot monotonic
2022-10-28T20:20:38.395377image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
422.69310
 
< 0.1%
361.06210
 
< 0.1%
419.1629
 
< 0.1%
452.8399
 
< 0.1%
475.6979
 
< 0.1%
415.7239
 
< 0.1%
507.499
 
< 0.1%
522.3639
 
< 0.1%
527.9369
 
< 0.1%
423.5649
 
< 0.1%
Other values (232986)431198
> 99.9%
ValueCountFrequency (%)
04
< 0.1%
5.3841
 
< 0.1%
9.2941
 
< 0.1%
16.4661
 
< 0.1%
26.8061
 
< 0.1%
34.1981
 
< 0.1%
36.1581
 
< 0.1%
40.6921
 
< 0.1%
41.8991
 
< 0.1%
43.921
 
< 0.1%
ValueCountFrequency (%)
861.8541
< 0.1%
851.0851
< 0.1%
850.751
< 0.1%
846.6781
< 0.1%
844.2911
< 0.1%
833.9191
< 0.1%
832.0341
< 0.1%
829.2951
< 0.1%
827.0361
< 0.1%
827.0261
< 0.1%

science
Real number (ℝ≥0)

HIGH CORRELATION

Distinct199531
Distinct (%)46.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean465.5219762
Minimum25.103
Maximum888.359
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.3 MiB
2022-10-28T20:20:38.555875image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum25.103
5-th percentile303.9999
Q1390.4915
median462.747
Q3538.477
95-th percentile635.587
Maximum888.359
Range863.256
Interquartile range (IQR)147.9855

Descriptive statistics

Standard deviation102.0113656
Coefficient of variation (CV)0.2191332973
Kurtosis-0.3932461886
Mean465.5219762
Median Absolute Deviation (MAD)73.939
Skewness0.1039951968
Sum200774973.1
Variance10406.31871
MonotonicityNot monotonic
2022-10-28T20:20:38.667398image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
467.5812
 
< 0.1%
461.45312
 
< 0.1%
429.7411
 
< 0.1%
502.31911
 
< 0.1%
435.27811
 
< 0.1%
468.47411
 
< 0.1%
397.44610
 
< 0.1%
351.89410
 
< 0.1%
437.94710
 
< 0.1%
449.42710
 
< 0.1%
Other values (199521)431182
> 99.9%
ValueCountFrequency (%)
25.1031
< 0.1%
58.7481
< 0.1%
66.4981
< 0.1%
75.7481
< 0.1%
78.331
< 0.1%
88.161
< 0.1%
88.2651
< 0.1%
88.6551
< 0.1%
98.0861
< 0.1%
101.9541
< 0.1%
ValueCountFrequency (%)
888.3591
< 0.1%
876.7461
< 0.1%
871.4811
< 0.1%
870.021
< 0.1%
863.7131
< 0.1%
860.2761
< 0.1%
859.7831
< 0.1%
856.6191
< 0.1%
856.2471
< 0.1%
855.721
< 0.1%

stu_wgt
Real number (ℝ≥0)

HIGH CORRELATION

Distinct33214
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.45132543
Minimum1
Maximum2160.911
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.3 MiB
2022-10-28T20:20:38.787243image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1.07692
Q15.83732
median13.86692
Q356.88371
95-th percentile200.8544
Maximum2160.911
Range2159.911
Interquartile range (IQR)51.04639

Descriptive statistics

Standard deviation109.4962558
Coefficient of variation (CV)2.01090157
Kurtosis35.09835109
Mean54.45132543
Median Absolute Deviation (MAD)11.68725
Skewness4.858503871
Sum23484312.14
Variance11989.43003
MonotonicityNot monotonic
2022-10-28T20:20:38.901618image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15558
 
1.3%
32.64061810
 
0.2%
4.54703767
 
0.2%
1.02139762
 
0.2%
1.38571562
 
0.1%
1.0625523
 
0.1%
56.88371431
 
0.1%
59.80729429
 
0.1%
1.06667420
 
0.1%
5.22017327
 
0.1%
Other values (33204)420701
97.5%
ValueCountFrequency (%)
15558
1.3%
1.000699
 
< 0.1%
1.0031333
 
< 0.1%
1.00503123
 
< 0.1%
1.00719139
 
< 0.1%
1.00806124
 
< 0.1%
1.01100
 
< 0.1%
1.01031194
 
< 0.1%
1.01047191
 
< 0.1%
1.0108748
 
< 0.1%
ValueCountFrequency (%)
2160.91112
< 0.1%
1968.5328
< 0.1%
1890.57910
< 0.1%
1766.9163
 
< 0.1%
1743.3173
 
< 0.1%
1717.2694
 
< 0.1%
1603.93416
< 0.1%
1548.7985
 
< 0.1%
1442.6972
 
< 0.1%
1439.28713
< 0.1%

desk
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
yes
359611 
no
52005 
NA
 
19674

Length

Max length3
Median length3
Mean length2.833803241
Min length2

Characters and Unicode

Total characters1.222.191
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowyes
2nd rowyes
3rd rowNA
4th rowyes
5th rowyes

Common Values

ValueCountFrequency (%)
yes359611
83.4%
no52005
 
12.1%
NA19674
 
4.6%

Length

2022-10-28T20:20:39.158402image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:39.251954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
yes359611
83.4%
no52005
 
12.1%
na19674
 
4.6%

Most occurring characters

ValueCountFrequency (%)
y359611
29.4%
e359611
29.4%
s359611
29.4%
n52005
 
4.3%
o52005
 
4.3%
N19674
 
1.6%
A19674
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1182843
96.8%
Uppercase Letter39348
 
3.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
y359611
30.4%
e359611
30.4%
s359611
30.4%
n52005
 
4.4%
o52005
 
4.4%
Uppercase Letter
ValueCountFrequency (%)
N19674
50.0%
A19674
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1222191
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
y359611
29.4%
e359611
29.4%
s359611
29.4%
n52005
 
4.3%
o52005
 
4.3%
N19674
 
1.6%
A19674
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1222191
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
y359611
29.4%
e359611
29.4%
s359611
29.4%
n52005
 
4.3%
o52005
 
4.3%
N19674
 
1.6%
A19674
 
1.6%

room
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
yes
331694 
no
75412 
NA
 
24184

Length

Max length3
Median length3
Mean length2.769074173
Min length2

Characters and Unicode

Total characters1.194.274
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNA
2nd rowNA
3rd rowNA
4th rowNA
5th rowNA

Common Values

ValueCountFrequency (%)
yes331694
76.9%
no75412
 
17.5%
NA24184
 
5.6%

Length

2022-10-28T20:20:39.338585image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:39.456751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
yes331694
76.9%
no75412
 
17.5%
na24184
 
5.6%

Most occurring characters

ValueCountFrequency (%)
y331694
27.8%
e331694
27.8%
s331694
27.8%
n75412
 
6.3%
o75412
 
6.3%
N24184
 
2.0%
A24184
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1145906
96.0%
Uppercase Letter48368
 
4.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
y331694
28.9%
e331694
28.9%
s331694
28.9%
n75412
 
6.6%
o75412
 
6.6%
Uppercase Letter
ValueCountFrequency (%)
N24184
50.0%
A24184
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1194274
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
y331694
27.8%
e331694
27.8%
s331694
27.8%
n75412
 
6.3%
o75412
 
6.3%
N24184
 
2.0%
A24184
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1194274
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
y331694
27.8%
e331694
27.8%
s331694
27.8%
n75412
 
6.3%
o75412
 
6.3%
N24184
 
2.0%
A24184
 
2.0%

dishwasher
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
NA
431290 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters862.580
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNA
2nd rowNA
3rd rowNA
4th rowNA
5th rowNA

Common Values

ValueCountFrequency (%)
NA431290
100.0%

Length

2022-10-28T20:20:39.532882image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:39.617191image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
na431290
100.0%

Most occurring characters

ValueCountFrequency (%)
N431290
50.0%
A431290
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter862580
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N431290
50.0%
A431290
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin862580
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N431290
50.0%
A431290
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII862580
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N431290
50.0%
A431290
50.0%

television
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
3+
176069 
2
135142 
1
95561 
NA
17668 
0
 
6850

Length

Max length2
Median length1
Mean length1.449203552
Min length1

Characters and Unicode

Total characters625.027
Distinct characters7
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNA
2nd rowNA
3rd rowNA
4th rowNA
5th rowNA

Common Values

ValueCountFrequency (%)
3+176069
40.8%
2135142
31.3%
195561
22.2%
NA17668
 
4.1%
06850
 
1.6%

Length

2022-10-28T20:20:39.698975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:39.809081image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
3176069
40.8%
2135142
31.3%
195561
22.2%
na17668
 
4.1%
06850
 
1.6%

Most occurring characters

ValueCountFrequency (%)
3176069
28.2%
+176069
28.2%
2135142
21.6%
195561
15.3%
N17668
 
2.8%
A17668
 
2.8%
06850
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number413622
66.2%
Math Symbol176069
28.2%
Uppercase Letter35336
 
5.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3176069
42.6%
2135142
32.7%
195561
23.1%
06850
 
1.7%
Uppercase Letter
ValueCountFrequency (%)
N17668
50.0%
A17668
50.0%
Math Symbol
ValueCountFrequency (%)
+176069
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common589691
94.3%
Latin35336
 
5.7%

Most frequent character per script

Common
ValueCountFrequency (%)
3176069
29.9%
+176069
29.9%
2135142
22.9%
195561
16.2%
06850
 
1.2%
Latin
ValueCountFrequency (%)
N17668
50.0%
A17668
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII625027
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3176069
28.2%
+176069
28.2%
2135142
21.6%
195561
15.3%
N17668
 
2.8%
A17668
 
2.8%
06850
 
1.1%

computer_n
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
3+
141643 
2
112124 
1
110264 
0
48050 
NA
19209 

Length

Max length2
Median length1
Mean length1.372955552
Min length1

Characters and Unicode

Total characters592.142
Distinct characters7
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNA
2nd rowNA
3rd rowNA
4th rowNA
5th rowNA

Common Values

ValueCountFrequency (%)
3+141643
32.8%
2112124
26.0%
1110264
25.6%
048050
 
11.1%
NA19209
 
4.5%

Length

2022-10-28T20:20:39.914069image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:40.017526image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
3141643
32.8%
2112124
26.0%
1110264
25.6%
048050
 
11.1%
na19209
 
4.5%

Most occurring characters

ValueCountFrequency (%)
3141643
23.9%
+141643
23.9%
2112124
18.9%
1110264
18.6%
048050
 
8.1%
N19209
 
3.2%
A19209
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number412081
69.6%
Math Symbol141643
 
23.9%
Uppercase Letter38418
 
6.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3141643
34.4%
2112124
27.2%
1110264
26.8%
048050
 
11.7%
Uppercase Letter
ValueCountFrequency (%)
N19209
50.0%
A19209
50.0%
Math Symbol
ValueCountFrequency (%)
+141643
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common553724
93.5%
Latin38418
 
6.5%

Most frequent character per script

Common
ValueCountFrequency (%)
3141643
25.6%
+141643
25.6%
2112124
20.2%
1110264
19.9%
048050
 
8.7%
Latin
ValueCountFrequency (%)
N19209
50.0%
A19209
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII592142
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3141643
23.9%
+141643
23.9%
2112124
18.9%
1110264
18.6%
048050
 
8.1%
N19209
 
3.2%
A19209
 
3.2%

car
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
1
134766 
2
126242 
0
78110 
3+
66877 
NA
25295 

Length

Max length2
Median length1
Mean length1.213712351
Min length1

Characters and Unicode

Total characters523.462
Distinct characters7
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNA
2nd rowNA
3rd rowNA
4th rowNA
5th rowNA

Common Values

ValueCountFrequency (%)
1134766
31.2%
2126242
29.3%
078110
18.1%
3+66877
15.5%
NA25295
 
5.9%

Length

2022-10-28T20:20:40.113865image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:40.216826image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1134766
31.2%
2126242
29.3%
078110
18.1%
366877
15.5%
na25295
 
5.9%

Most occurring characters

ValueCountFrequency (%)
1134766
25.7%
2126242
24.1%
078110
14.9%
366877
12.8%
+66877
12.8%
N25295
 
4.8%
A25295
 
4.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number405995
77.6%
Math Symbol66877
 
12.8%
Uppercase Letter50590
 
9.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1134766
33.2%
2126242
31.1%
078110
19.2%
366877
16.5%
Uppercase Letter
ValueCountFrequency (%)
N25295
50.0%
A25295
50.0%
Math Symbol
ValueCountFrequency (%)
+66877
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common472872
90.3%
Latin50590
 
9.7%

Most frequent character per script

Common
ValueCountFrequency (%)
1134766
28.5%
2126242
26.7%
078110
16.5%
366877
14.1%
+66877
14.1%
Latin
ValueCountFrequency (%)
N25295
50.0%
A25295
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII523462
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1134766
25.7%
2126242
24.1%
078110
14.9%
366877
12.8%
+66877
12.8%
N25295
 
4.8%
A25295
 
4.8%

book
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
26-100
114839 
0-10
85151 
11-25
83872 
101-200
60314 
201-500
44070 
Other values (2)
43044 

Length

Max length13
Median length7
Mean length5.921199193
Min length2

Characters and Unicode

Total characters2.553.754
Distinct characters17
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmore than 500
2nd row201-500
3rd rowmore than 500
4th row101-200
5th row11-25

Common Values

ValueCountFrequency (%)
26-100114839
26.6%
0-1085151
19.7%
11-2583872
19.4%
101-20060314
14.0%
201-50044070
 
10.2%
more than 50026180
 
6.1%
NA16864
 
3.9%

Length

2022-10-28T20:20:40.330240image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-28T20:20:40.510859image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
26-100114839
23.7%
0-1085151
17.6%
11-2583872
17.3%
101-20060314
12.5%
201-50044070
 
9.1%
more26180
 
5.4%
than26180
 
5.4%
50026180
 
5.4%
na16864
 
3.5%

Most occurring characters

ValueCountFrequency (%)
0765492
30.0%
1532432
20.8%
-388246
15.2%
2303095
 
11.9%
5154122
 
6.0%
6114839
 
4.5%
52360
 
2.1%
t26180
 
1.0%
n26180
 
1.0%
a26180
 
1.0%
Other values (7)164628
 
6.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1869980
73.2%
Dash Punctuation388246
 
15.2%
Lowercase Letter209440
 
8.2%
Space Separator52360
 
2.1%
Uppercase Letter33728
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t26180
12.5%
n26180
12.5%
a26180
12.5%
h26180
12.5%
r26180
12.5%
e26180
12.5%
o26180
12.5%
m26180
12.5%
Decimal Number
ValueCountFrequency (%)
0765492
40.9%
1532432
28.5%
2303095
 
16.2%
5154122
 
8.2%
6114839
 
6.1%
Uppercase Letter
ValueCountFrequency (%)
N16864
50.0%
A16864
50.0%
Dash Punctuation
ValueCountFrequency (%)
-388246
100.0%
Space Separator
ValueCountFrequency (%)
52360
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2310586
90.5%
Latin243168
 
9.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
t26180
10.8%
n26180
10.8%
a26180
10.8%
h26180
10.8%
r26180
10.8%
e26180
10.8%
o26180
10.8%
m26180
10.8%
N16864
6.9%
A16864
6.9%
Common
ValueCountFrequency (%)
0765492
33.1%
1532432
23.0%
-388246
16.8%
2303095
 
13.1%
5154122
 
6.7%
6114839
 
5.0%
52360
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2553754
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0765492
30.0%
1532432
20.8%
-388246
15.2%
2303095
 
11.9%
5154122
 
6.0%
6114839
 
4.5%
52360
 
2.1%
t26180
 
1.0%
n26180
 
1.0%
a26180
 
1.0%
Other values (7)164628
 
6.4%

wealth
Categorical

HIGH CARDINALITY

Distinct47482
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
NA
 
14890
2.3517
 
731
2.3725
 
626
4.2559
 
471
4.1549
 
439
Other values (47477)
414133 

Length

Max length21
Median length7
Mean length6.345829952
Min length1

Characters and Unicode

Total characters2.736.893
Distinct characters15
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7129 ?
Unique (%)1.7%

Sample

1st rowNA
2nd rowNA
3rd rowNA
4th rowNA
5th rowNA

Common Values

ValueCountFrequency (%)
NA14890
 
3.5%
2.3517731
 
0.2%
2.3725626
 
0.1%
4.2559471
 
0.1%
4.1549439
 
0.1%
-3.7458385
 
0.1%
-3.2413317
 
0.1%
2.752301
 
0.1%
2.7852265
 
0.1%
3.3323237
 
0.1%
Other values (47472)412628
95.7%

Length

2022-10-28T20:20:40.686877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na14890
 
3.5%
2.3517732
 
0.2%
2.3725626
 
0.1%
4.2559471
 
0.1%
4.1549439
 
0.1%
3.7458385
 
0.1%
3.2413317
 
0.1%
2.752301
 
0.1%
2.7852268
 
0.1%
3.3323237
 
0.1%
Other values (29640)412624
95.7%

Most occurring characters

ValueCountFrequency (%)
.416066
15.2%
0398810
14.6%
1286211
10.5%
-247557
9.0%
2214134
7.8%
3184585
6.7%
4170943
6.2%
5165398
 
6.0%
6159873
 
5.8%
7159172
 
5.8%
Other values (5)334144
12.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2043195
74.7%
Other Punctuation416066
 
15.2%
Dash Punctuation247557
 
9.0%
Uppercase Letter29780
 
1.1%
Lowercase Letter295
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0398810
19.5%
1286211
14.0%
2214134
10.5%
3184585
9.0%
4170943
8.4%
5165398
8.1%
6159873
7.8%
7159172
 
7.8%
8153398
 
7.5%
9150671
 
7.4%
Uppercase Letter
ValueCountFrequency (%)
N14890
50.0%
A14890
50.0%
Other Punctuation
ValueCountFrequency (%)
.416066
100.0%
Dash Punctuation
ValueCountFrequency (%)
-247557
100.0%
Lowercase Letter
ValueCountFrequency (%)
e295
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2706818
98.9%
Latin30075
 
1.1%

Most frequent character per script

Common
ValueCountFrequency (%)
.416066
15.4%
0398810
14.7%
1286211
10.6%
-247557
9.1%
2214134
7.9%
3184585
6.8%
4170943
6.3%
5165398
 
6.1%
6159873
 
5.9%
7159172
 
5.9%
Other values (2)304069
11.2%
Latin
ValueCountFrequency (%)
N14890
49.5%
A14890
49.5%
e295
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2736893
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.416066
15.2%
0398810
14.6%
1286211
10.5%
-247557
9.0%
2214134
7.8%
3184585
6.7%
4170943
6.2%
5165398
 
6.0%
6159873
 
5.8%
7159172
 
5.8%
Other values (5)334144
12.2%

escs
Categorical

HIGH CARDINALITY

Distinct52733
Distinct (%)12.2%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
NA
 
14944
0.8731
 
30
0.4149
 
28
-0.0998
 
27
0.5469
 
27
Other values (52728)
416234 

Length

Max length21
Median length20
Mean length6.295413759
Min length1

Characters and Unicode

Total characters2.715.149
Distinct characters15
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7650 ?
Unique (%)1.8%

Sample

1st row0.30620000000000003
2nd row1.478
3rd rowNA
4th row0.2243
5th row0.1467

Common Values

ValueCountFrequency (%)
NA14944
 
3.5%
0.873130
 
< 0.1%
0.414928
 
< 0.1%
-0.099827
 
< 0.1%
0.546927
 
< 0.1%
0.797727
 
< 0.1%
0.713926
 
< 0.1%
0.949826
 
< 0.1%
0.44926
 
< 0.1%
0.334826
 
< 0.1%
Other values (52723)416103
96.5%

Length

2022-10-28T20:20:40.836037image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na14944
 
3.5%
0.44948
 
< 0.1%
0.949846
 
< 0.1%
0.783845
 
< 0.1%
0.713945
 
< 0.1%
0.506344
 
< 0.1%
0.149344
 
< 0.1%
0.99244
 
< 0.1%
0.414944
 
< 0.1%
0.62543
 
< 0.1%
Other values (32866)415943
96.4%

Most occurring characters

ValueCountFrequency (%)
.416085
15.3%
0400310
14.7%
1304132
11.2%
-226213
8.3%
2199685
7.4%
3174738
6.4%
4167007
6.2%
5163819
 
6.0%
6161067
 
5.9%
7160069
 
5.9%
Other values (5)342024
12.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2042737
75.2%
Other Punctuation416085
 
15.3%
Dash Punctuation226213
 
8.3%
Uppercase Letter29888
 
1.1%
Lowercase Letter226
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0400310
19.6%
1304132
14.9%
2199685
9.8%
3174738
8.6%
4167007
8.2%
5163819
8.0%
6161067
7.9%
7160069
 
7.8%
8157114
 
7.7%
9154796
 
7.6%
Uppercase Letter
ValueCountFrequency (%)
N14944
50.0%
A14944
50.0%
Other Punctuation
ValueCountFrequency (%)
.416085
100.0%
Dash Punctuation
ValueCountFrequency (%)
-226213
100.0%
Lowercase Letter
ValueCountFrequency (%)
e226
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2685035
98.9%
Latin30114
 
1.1%

Most frequent character per script

Common
ValueCountFrequency (%)
.416085
15.5%
0400310
14.9%
1304132
11.3%
-226213
8.4%
2199685
7.4%
3174738
6.5%
4167007
6.2%
5163819
 
6.1%
6161067
 
6.0%
7160069
 
6.0%
Other values (2)311910
11.6%
Latin
ValueCountFrequency (%)
N14944
49.6%
A14944
49.6%
e226
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII2715149
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.416085
15.3%
0400310
14.7%
1304132
11.2%
-226213
8.3%
2199685
7.4%
3174738
6.4%
4167007
6.2%
5163819
 
6.0%
6161067
 
5.9%
7160069
 
5.9%
Other values (5)342024
12.6%

country_2
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct60
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
BRA
 
23141
CAN
 
20058
AUS
 
14530
ARE
 
14167
GBR
 
14157
Other values (55)
345237 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1.293.870
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowISR
2nd rowISR
3rd rowISR
4th rowISR
5th rowISR

Common Values

ValueCountFrequency (%)
BRA23141
 
5.4%
CAN20058
 
4.7%
AUS14530
 
3.4%
ARE14167
 
3.3%
GBR14157
 
3.3%
QAT12083
 
2.8%
COL11795
 
2.7%
ITA11583
 
2.7%
BEL9651
 
2.2%
THA8249
 
1.9%
Other values (50)291876
67.7%

Length

2022-10-28T20:20:40.930258image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bra23141
 
5.4%
can20058
 
4.7%
aus14530
 
3.4%
are14167
 
3.3%
gbr14157
 
3.3%
qat12083
 
2.8%
col11795
 
2.7%
ita11583
 
2.7%
bel9651
 
2.2%
tha8249
 
1.9%
Other values (50)291876
67.7%

Most occurring characters

ValueCountFrequency (%)
R149516
 
11.6%
A143566
 
11.1%
N95098
 
7.3%
L82082
 
6.3%
E79483
 
6.1%
U79479
 
6.1%
T73263
 
5.7%
S72899
 
5.6%
B62638
 
4.8%
C57164
 
4.4%
Other values (16)398682
30.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1293870
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R149516
 
11.6%
A143566
 
11.1%
N95098
 
7.3%
L82082
 
6.3%
E79483
 
6.1%
U79479
 
6.1%
T73263
 
5.7%
S72899
 
5.6%
B62638
 
4.8%
C57164
 
4.4%
Other values (16)398682
30.8%

Most occurring scripts

ValueCountFrequency (%)
Latin1293870
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
R149516
 
11.6%
A143566
 
11.1%
N95098
 
7.3%
L82082
 
6.3%
E79483
 
6.1%
U79479
 
6.1%
T73263
 
5.7%
S72899
 
5.6%
B62638
 
4.8%
C57164
 
4.4%
Other values (16)398682
30.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII1293870
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R149516
 
11.6%
A143566
 
11.1%
N95098
 
7.3%
L82082
 
6.3%
E79483
 
6.1%
U79479
 
6.1%
T73263
 
5.7%
S72899
 
5.6%
B62638
 
4.8%
C57164
 
4.4%
Other values (16)398682
30.8%

country_name
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct60
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
Brazil
 
23141
Canada
 
20058
Australia
 
14530
United Arab Emirates
 
14167
United Kingdom
 
14157
Other values (55)
345237 

Length

Max length20
Median length18
Mean length7.920786478
Min length4

Characters and Unicode

Total characters3.416.156
Distinct characters46
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIsrael
2nd rowIsrael
3rd rowIsrael
4th rowIsrael
5th rowIsrael

Common Values

ValueCountFrequency (%)
Brazil23141
 
5.4%
Canada20058
 
4.7%
Australia14530
 
3.4%
United Arab Emirates14167
 
3.3%
United Kingdom14157
 
3.3%
Qatar12083
 
2.8%
Colombia11795
 
2.7%
Italy11583
 
2.7%
Belgium9651
 
2.2%
Thailand8249
 
1.9%
Other values (50)291876
67.7%

Length

2022-10-28T20:20:41.030580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united34036
 
6.8%
brazil23141
 
4.6%
canada20058
 
4.0%
australia14530
 
2.9%
arab14167
 
2.8%
emirates14167
 
2.8%
kingdom14157
 
2.8%
qatar12083
 
2.4%
colombia11795
 
2.4%
italy11583
 
2.3%
Other values (56)331483
66.1%

Most occurring characters

ValueCountFrequency (%)
a508425
14.9%
i295083
 
8.6%
n258011
 
7.6%
e246909
 
7.2%
r223850
 
6.6%
l178279
 
5.2%
t173762
 
5.1%
o171406
 
5.0%
d136300
 
4.0%
u113944
 
3.3%
Other values (36)1110187
32.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2845046
83.3%
Uppercase Letter501200
 
14.7%
Space Separator69910
 
2.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a508425
17.9%
i295083
10.4%
n258011
9.1%
e246909
8.7%
r223850
7.9%
l178279
 
6.3%
t173762
 
6.1%
o171406
 
6.0%
d136300
 
4.8%
u113944
 
4.0%
Other values (13)539077
18.9%
Uppercase Letter
ValueCountFrequency (%)
C51581
 
10.3%
S48218
 
9.6%
A46438
 
9.3%
U40098
 
8.0%
B38720
 
7.7%
I33806
 
6.7%
R22518
 
4.5%
M22192
 
4.4%
L21239
 
4.2%
E19754
 
3.9%
Other values (12)156636
31.3%
Space Separator
ValueCountFrequency (%)
69910
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3346246
98.0%
Common69910
 
2.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a508425
15.2%
i295083
 
8.8%
n258011
 
7.7%
e246909
 
7.4%
r223850
 
6.7%
l178279
 
5.3%
t173762
 
5.2%
o171406
 
5.1%
d136300
 
4.1%
u113944
 
3.4%
Other values (35)1040277
31.1%
Common
ValueCountFrequency (%)
69910
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3416156
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a508425
14.9%
i295083
 
8.6%
n258011
 
7.6%
e246909
 
7.2%
r223850
 
6.6%
l178279
 
5.2%
t173762
 
5.1%
o171406
 
5.0%
d136300
 
4.0%
u113944
 
3.3%
Other values (36)1110187
32.5%

rank
Real number (ℝ≥0)

HIGH CORRELATION

Distinct60
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47.75911799
Minimum1
Maximum127
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2022-10-28T20:20:41.160803image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q119
median40
Q375
95-th percentile116
Maximum127
Range126
Interquartile range (IQR)56

Descriptive statistics

Standard deviation33.26787061
Coefficient of variation (CV)0.696576319
Kurtosis-0.5933703325
Mean47.75911799
Median Absolute Deviation (MAD)25
Skewness0.6278080231
Sum20598030
Variance1106.751215
MonotonicityNot monotonic
2022-10-28T20:20:41.300772image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8123141
 
5.4%
820058
 
4.7%
1514530
 
3.4%
7314167
 
3.3%
1414157
 
3.3%
12712083
 
2.8%
10811795
 
2.7%
3411583
 
2.7%
169651
 
2.2%
618249
 
1.9%
Other values (50)291876
67.7%
ValueCountFrequency (%)
16115
 
1.4%
46647
 
1.5%
55581
 
1.3%
820058
4.7%
95385
 
1.2%
105860
 
1.4%
115882
 
1.4%
1414157
3.3%
1514530
3.4%
169651
2.2%
ValueCountFrequency (%)
12712083
2.8%
1175215
 
1.2%
1164740
 
1.1%
10811795
2.7%
994546
 
1.1%
966971
 
1.6%
955519
 
1.3%
915665
 
1.3%
855375
 
1.2%
8123141
5.4%

country_3
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct60
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
Brazil
 
23141
Canada
 
20058
Australia
 
14530
United Arab Emirates
 
14167
United Kingdom
 
14157
Other values (55)
345237 

Length

Max length20
Median length18
Mean length7.920786478
Min length4

Characters and Unicode

Total characters3.416.156
Distinct characters46
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIsrael
2nd rowIsrael
3rd rowIsrael
4th rowIsrael
5th rowIsrael

Common Values

ValueCountFrequency (%)
Brazil23141
 
5.4%
Canada20058
 
4.7%
Australia14530
 
3.4%
United Arab Emirates14167
 
3.3%
United Kingdom14157
 
3.3%
Qatar12083
 
2.8%
Colombia11795
 
2.7%
Italy11583
 
2.7%
Belgium9651
 
2.2%
Thailand8249
 
1.9%
Other values (50)291876
67.7%

Length

2022-10-28T20:20:41.491576image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united34036
 
6.8%
brazil23141
 
4.6%
canada20058
 
4.0%
australia14530
 
2.9%
arab14167
 
2.8%
emirates14167
 
2.8%
kingdom14157
 
2.8%
qatar12083
 
2.4%
colombia11795
 
2.4%
italy11583
 
2.3%
Other values (56)331483
66.1%

Most occurring characters

ValueCountFrequency (%)
a508425
14.9%
i295083
 
8.6%
n258011
 
7.6%
e246909
 
7.2%
r223850
 
6.6%
l178279
 
5.2%
t173762
 
5.1%
o171406
 
5.0%
d136300
 
4.0%
u113944
 
3.3%
Other values (36)1110187
32.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2845046
83.3%
Uppercase Letter501200
 
14.7%
Space Separator69910
 
2.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a508425
17.9%
i295083
10.4%
n258011
9.1%
e246909
8.7%
r223850
7.9%
l178279
 
6.3%
t173762
 
6.1%
o171406
 
6.0%
d136300
 
4.8%
u113944
 
4.0%
Other values (13)539077
18.9%
Uppercase Letter
ValueCountFrequency (%)
C51581
 
10.3%
S48218
 
9.6%
A46438
 
9.3%
U40098
 
8.0%
B38720
 
7.7%
I33806
 
6.7%
R22518
 
4.5%
M22192
 
4.4%
L21239
 
4.2%
E19754
 
3.9%
Other values (12)156636
31.3%
Space Separator
ValueCountFrequency (%)
69910
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3346246
98.0%
Common69910
 
2.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a508425
15.2%
i295083
 
8.8%
n258011
 
7.7%
e246909
 
7.4%
r223850
 
6.7%
l178279
 
5.3%
t173762
 
5.2%
o171406
 
5.1%
d136300
 
4.1%
u113944
 
3.4%
Other values (35)1040277
31.1%
Common
ValueCountFrequency (%)
69910
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3416156
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a508425
14.9%
i295083
 
8.6%
n258011
 
7.6%
e246909
 
7.2%
r223850
 
6.6%
l178279
 
5.2%
t173762
 
5.1%
o171406
 
5.0%
d136300
 
4.0%
u113944
 
3.3%
Other values (36)1110187
32.5%

finalIq
Real number (ℝ≥0)

HIGH CORRELATION

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean92.99102692
Minimum80
Maximum107
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2022-10-28T20:20:41.638200image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum80
5-th percentile82
Q186
median95
Q398
95-th percentile100
Maximum107
Range27
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.581675722
Coefficient of variation (CV)0.07077753564
Kurtosis-1.059684708
Mean92.99102692
Median Absolute Deviation (MAD)4
Skewness-0.2670780865
Sum40106100
Variance43.31845531
MonotonicityNot monotonic
2022-10-28T20:20:41.747957image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
9950932
11.8%
8540694
 
9.4%
9837969
 
8.8%
10037185
 
8.6%
9432015
 
7.4%
9730544
 
7.1%
9628833
 
6.7%
8721735
 
5.0%
8921197
 
4.9%
8619449
 
4.5%
Other values (11)110737
25.7%
ValueCountFrequency (%)
8012083
 
2.8%
829955
 
2.3%
8311795
 
2.7%
8417036
4.0%
8540694
9.4%
8619449
4.5%
8721735
5.0%
8921197
4.9%
906062
 
1.4%
914876
 
1.1%
ValueCountFrequency (%)
1076115
 
1.4%
10412228
 
2.8%
10037185
8.6%
9950932
11.8%
9837969
8.8%
9730544
7.1%
9628833
6.7%
9513802
 
3.2%
9432015
7.4%
9311460
 
2.7%

pop2021
Real number (ℝ≥0)

HIGH CORRELATION

Distinct60
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47109.44445
Minimum343.353
Maximum332915.073
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.3 MiB
2022-10-28T20:20:41.886549image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum343.353
5-th percentile1325.185
Q15465.63
median11632.326
Q360367.477
95-th percentile213993.437
Maximum332915.073
Range332571.72
Interquartile range (IQR)54901.847

Descriptive statistics

Standard deviation69048.93932
Coefficient of variation (CV)1.465713301
Kurtosis4.751062076
Mean47109.44445
Median Absolute Deviation (MAD)9765.384
Skewness2.241364252
Sum2.03178323 × 1010
Variance4767756021
MonotonicityNot monotonic
2022-10-28T20:20:42.265662image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
213993.43723141
 
5.4%
38067.90320058
 
4.7%
25788.21514530
 
3.4%
9991.08914167
 
3.3%
68207.11614157
 
3.3%
2930.52812083
 
2.8%
51265.84411795
 
2.7%
60367.47711583
 
2.7%
11632.3269651
 
2.2%
69950.858249
 
1.9%
Other values (50)291876
67.7%
ValueCountFrequency (%)
343.3533371
 
0.8%
442.7843634
 
0.8%
628.0535665
1.3%
634.8145299
1.2%
1325.1855587
1.3%
1866.9424869
1.1%
2078.7246406
1.5%
2689.8626525
1.5%
2872.9335215
1.2%
2930.52812083
2.8%
ValueCountFrequency (%)
332915.0735712
 
1.3%
276361.7836513
 
1.5%
213993.43723141
5.4%
145912.0256036
 
1.4%
130262.2167568
 
1.8%
126050.8046647
 
1.5%
98168.8335826
 
1.4%
85042.7385895
 
1.4%
83900.4736504
 
1.5%
69950.858249
 
1.9%

Interactions

2022-10-28T20:20:08.675563image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:04.603131image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:16.212120image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:17.893427image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:19.838876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:43.468239image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:45.531475image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:47.216033image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:49.014725image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:08.834490image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:10.811097image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:16.368004image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:18.129667image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:20.094919image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:43.708225image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:45.721524image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:47.404350image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:49.209638image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:09.002101image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:14.615584image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:16.541299image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:18.375380image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:20.290867image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:43.912040image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:45.908344image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:47.597972image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:49.771023image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:09.171958image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:14.845425image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:16.696442image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:18.581628image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:20.565328image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:44.155020image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:46.097951image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:47.789616image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:51.887265image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:09.367015image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:15.036497image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:16.928077image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:18.749062image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:21.020912image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:44.419221image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:46.284526image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:47.987186image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:57.082144image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:09.516492image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:15.398088image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:17.143565image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:18.985298image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:21.607322image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:44.623434image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:46.469571image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:48.188410image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:03.040093image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:09.663615image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:15.631536image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:17.358387image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:19.161675image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:26.107000image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:44.848255image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:46.655885image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:48.405473image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:08.090899image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:09.814174image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:15.886991image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:17.521520image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:19.341216image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:31.887036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:45.076842image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:46.844507image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:48.598810image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:08.356548image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:09.993126image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:16.050566image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:17.694237image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:19.534086image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:37.736055image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:45.318836image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:47.029030image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:19:48.799172image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-28T20:20:08.505389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-10-28T20:20:42.488942image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-10-28T20:20:42.733603image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-28T20:20:42.981376image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-28T20:20:43.123929image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-28T20:20:43.294338image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-28T20:20:43.599349image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-28T20:20:11.272278image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-28T20:20:13.849376image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

yearcountryschool_idstudent_idmother_educfather_educgendercomputerinternetmathreadsciencestu_wgtdeskroomdishwashertelevisioncomputer_ncarbookwealthescscountry_2country_namerankcountry_3finalIqpop2021
02015ISR3760014537603048ISCED 3AISCED 3AfemaleNANA524.233517.321517.57716.66115yesNANANANANAmore than 500NA0.30620000000000003ISRIsrael44Israel948789.774
12015ISR3760014537602416ISCED 3ANAfemaleNANA538.210519.036523.98216.66115yesNANANANANA201-500NA1.478ISRIsrael44Israel948789.774
22015ISR3760014537605312NANAfemaleNANA589.116621.705607.75716.66115NANANANANANAmore than 500NANAISRIsrael44Israel948789.774
32015ISR3760014537601743ISCED 3AISCED 3B, CfemaleNANA532.836548.028478.31916.66115yesNANANANANA101-200NA0.2243ISRIsrael44Israel948789.774
42015ISR3760014537603927ISCED 3B, CNAfemaleNANA478.684550.993483.71116.66115yesNANANANANA11-25NA0.1467ISRIsrael44Israel948789.774
52015ISR3760014537601586ISCED 3B, CISCED 3B, CfemaleNANA574.335598.832558.76316.66115yesNANANANANA26-100NA0.7979ISRIsrael44Israel948789.774
62015ISR3760014537605625NANAfemaleNANA336.421357.691334.37716.66115NANANANANANANANANAISRIsrael44Israel948789.774
72015ISR3760014537605647ISCED 3AISCED 1femaleNANA573.293575.490523.41316.66115yesNANANANANAmore than 500NA1.7141ISRIsrael44Israel948789.774
82015ISR3760014537604557NANAfemaleNANA532.381457.606496.09516.66115yesNANANANANA101-200NA0.6028ISRIsrael44Israel948789.774
92015ISR3760014537600526ISCED 3AISCED 3AfemaleNANA514.964587.544522.69416.66115yesNANANANANA26-100NA0.1493ISRIsrael44Israel948789.774

Last rows

yearcountryschool_idstudent_idmother_educfather_educgendercomputerinternetmathreadsciencestu_wgtdeskroomdishwashertelevisioncomputer_ncarbookwealthescscountry_2country_namerankcountry_3finalIqpop2021
4312802015GBR8265010982652930ISCED 3AISCED 3Afemaleyesyes530.299481.074546.44216.96736yesyesNA3+3+226-1001.27831.0714GBRUnited Kingdom14United Kingdom9968207.116
4312812015GBR8265010982650407ISCED 3AISCED 3B, Cfemaleyesyes499.617449.433451.77016.96736noyesNA3+1226-100-0.28180.6488GBRUnited Kingdom14United Kingdom9968207.116
4312822015GBR8265010982653069ISCED 3AISCED 3Afemaleyesyes515.442604.943581.11816.96736yesyesNA3+3+2101-2001.48261.1976GBRUnited Kingdom14United Kingdom9968207.116
4312832015GBR8265010982650368ISCED 2ISCED 3Afemaleyesyes518.956436.519468.84016.96736yesyesNA3+3+2201-5000.78571.3324GBRUnited Kingdom14United Kingdom9968207.116
4312842015GBR8265010982654153ISCED 3AISCED 3Afemaleyesyes603.154598.225604.46316.96736yesyesNA13+1more than 500-0.16141.1585GBRUnited Kingdom14United Kingdom9968207.116
4312852015GBR8265010982654468ISCED 3AISCED 3Afemaleyesyes640.392617.868641.73616.96736yesyesNA22211-251.18760.6643GBRUnited Kingdom14United Kingdom9968207.116
4312862015GBR8265010982653067ISCED 3AISCED 3Afemaleyesyes568.223514.913520.08916.96736yesyesNA3+3+1201-5000.49320.9365GBRUnited Kingdom14United Kingdom9968207.116
4312872015GBR8265010982651934ISCED 2ISCED 2femaleyesyes553.587590.505545.72916.96736yesyesNA3+3+3+11-252.1047-0.4144GBRUnited Kingdom14United Kingdom9968207.116
4312882015GBR8265010982652209ISCED 3AISCED 3Afemaleyesyes536.515500.004503.05516.96736yesyesNA23+2101-2001.72271.3572GBRUnited Kingdom14United Kingdom9968207.116
4312892015GBR8265010982654177ISCED 3AISCED 3B, Cfemaleyesyes413.326288.821393.72616.96736yesyesNA22226-1000.12451.1504GBRUnited Kingdom14United Kingdom9968207.116